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Abstract 


This  paper  examines  the  problem  of  communicating  an  n-bit  data  item  from  a 
client  to  a  server,  where  the  data  is  drawn  from  a  distribution  D  that  is  known 
to  the  server  but  not  to  the  client.  Since  this  question  is  motivated  by  asym¬ 
metric  communication  channels,  our  primary  goal  is  to  limit  the  number  of  bits 
transmitted  by  the  client.  We  present  several  protocols  in  which  the  expected 
number  of  bits  transmitted  by  the  server  and  client  are  0{n)  and  0{H{D)), 
respectively,  where  H{D)  is  the  entropy  of  D,  and  can  thus  be  significantly 
smaller  than  n.  Shannon’s  Theorem  implies  that  these  protocols  are  optimal  in 
terms  of  the  number  of  bits  sent  by  the  client.  The  expected  number  of  rounds 
of  communication  between  the  server  and  client  in  the  simplest  of  our  protocols 
is  0{H{D)).  We  also  give  a  protocol  for  which  the  expected  number  of  rounds 
is  only  0(1),  but  which  requires  more  computational  effort  on  the  part  of  the 
server.  A  third  protocol  provides  a  tradeoff  between  the  computational  effort 
and  the  number  of  rounds.  These  protocols  are  complemented  by  several  lower 
bounds  and  impossibility  results.  We  show  that  all  of  our  protocols  are  existen¬ 
tially  optimal  in  terms  of  the  number  of  bits  sent  by  the  server,  i.e.,  there  are 
distributions  for  which  the  total  number  of  bits  exchanged  has  to  be  at  least 
n  —  1.  In  addition,  we  show  that  there  is  no  protocol  that  is  optimal  for  every 
distribution  (as  opposed  to  just  existentially  optimal)  in  terms  of  bits  sent  by 
the  server.  We  demonstrate  this  by  proving  that  it  is  undecidable  to  compute, 
for  an  arbitrary  distribution  D,  the  minimum  expected  total  number  of  bits 
sent  by  the  server  and  client.  Furthermore,  the  problem  remains  undecidable 
even  if  only  an  approximate  solution  is  required,  for  any  reasonable  degree  of 
approximation. 


1  Introduction 


In  the  summer  of  1995,  the  second  author  set  out  to  establish  a  high-speed  wire¬ 
less  internet  connection  between  the  Carnegie  Mellon  campus  and  his  home  ap¬ 
proximately  one  mile  away.  A  directional  antenna  was  mounted  above  the  tallest 
tower  on  campus,  and  a  matching  antenna  was  installed  at  home.  The  anten¬ 
nas  were  driven  by  WaveLAN  transceivers,  which  implement  2  megabit /second 
wireless  ethernet.  The  installation  was  successful,  but  within  a  few  months  the 
performance  of  the  wireless  connection  deteriorated  to  the  point  that  it  was  no 
longer  usable.  This  problem  coincided  with  the  deployment  of  a  campus- wide 
wireless  network  intended  to  provide  laptop  users  with  uninterrupted  access  as 
they  roamed  the  campus  [6].  Unfortunately,  this  network  used  the  same  Wave¬ 
LAN  technology  and  carrier  frequency,  and  as  a  result,  packets  traveling  from 
home  to  campus  were  often  lost  in  transit. 

Ultimately,  the  goal  of  establishing  a  high-speed  bidirectional  wireless  con¬ 
nection  was  abandoned,  and  packets  were  instead  routed  from  home  to  campus 
across  an  ordinary  telephone  line  using  a  28.8  kilobit/sec  modem  at  each  end. 
The  resulting  connection  was  highly  asymmetric,  with  download  speed  exceeding 
upload  speed  by  a  factor  of  69-to-l.  The  connection  proved  adequate,  however, 
to  support  an  X-terminal  and  web  browser  at  home,  and  has  been  in  daily  use 
for  several  years.  Some  tasks,  however,  are  limited  by  the  slow  upload  speed. 

In  the  past  two  years  a  number  of  commercial  asymmetric  networking  tech¬ 
nologies  have  also  been  introduced.  For  example,  using  ordinary  telephone  lines, 
56k  modems  can  download  at  up  to  56kbs,  but  can  upload  data  at  a  maximum 
rate  of  33.6kbs.  In  some  cities,  telephone  companies  are  now  offering  much 
more  asymmetric  network  connections.  For  example,  in  Pittsburgh  trials  of 
asymmetric  digital  subscriber  loops  (ADSL)  have  begun.  These  ADSLs  provide 
a  download  speed  of  1.5mbs,  and  an  upload  speed  of  64kbs.  As  another  ex¬ 
ample,  The  DirecPC  network  connection  offered  by  Hughes  beams  data  down 
from  a  satellite  to  the  user’s  home  at  400kbs,  and  the  user  sends  data  back 
using  an  ordinary  phone  line  (at  33.6kbs).  Internet  access  provided  through 
cable-television  networks  is  also  typically  asymmetric.  In  the  Boston  area,  for 
example,  MediaOne  is  offering  service  with  a  download  rate  of  1.5mbs  and  an 
upload  rate  of  300kbs.  Independent  of  whether  the  underlying  medium  is  asym¬ 
metric,  it  has  been  observed  that  home  users  typically  download  much  more 
data  than  they  upload. 

This  paper  aims  to  address  the  limitations  of  asymmetric  network  connec¬ 
tions  by  examining  the  following  question.  Is  it  possible  to  use  a  high-speed 
downlink  to  improve  the  performance  of  a  low-speed  uplink?  Perhaps  surpris¬ 
ingly,  in  several  natural  situations  the  answer  is  yes.  To  be  more  precise,  suppose 
that  a  client  at  the  end  of  the  downlink  has  an  n-bit  data  item  to  send  to  a 
server  at  the  end  of  the  uplink.  We  show  that  in  certain  circumstances,  the 
server  can  use  the  high-speed  downlink  to  reduce  the  expected  number  of  bits 
sent  by  the  client  across  the  low-speed  uplink  to  significantly  less  than  n. 
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1.1  Reducing  the  number  of  bits  sent  by  the  client 

We  study  an  asynchronous  model  based  on  Yao’s  two-party  communication 
complexity  model  [14].  In  order  for  the  client  to  transmit  its  n-bit  data  item  to 
the  server,  the  client  and  the  server  communicate  bits  to  each  other,  as  specified 
by  some  fixed  protocol  V.  At  each  step  of  the  protocol,  T  specifies  whether  the 
client  or  the  server  sends  the  next  bit,  as  well  as  the  value  of  that  bit.  A  bit 
sent  by  the  client  can  only  depend  on  the  bits  sent  thus  far  by  the  server  and 
the  information  known  to  the  client  at  the  start  of  the  protocol.  The  analogous 
requirement  holds  for  the  server.  When  the  protocol  terminates,  the  server  must 
have  enough  information  to  determine  with  certainty  the  n-bit  data  item. 

It  is  already  well  known,  and  not  difficult  to  prove,  that  if  the  server  has 
no  information  about  the  n-bit  data  item  held  by  the  client,  then  in  the  worst 
case  the  number  of  bits  sent  by  the  client  must  be  at  least  n.  This  information- 
theoretic  lower  bound  would  seem  to  imply  that  there  is  no  way  to  exploit 
the  high-speed  downlink.  There  are  many  circumstances,  however,  in  which 
the  server  has  some  information  about  the  data  item  held  by  the  client.  For 
example,  if  the  client  is  sending  a  sequence,  of  keystrokes  to  the  server,  the 
server  may  know  the  frequency  with  which  the  client  presses  any  particular  key. 
Throughout  this  paper  we  assume  that  the  n-bit  data  item  held  by  the  client  has 
been  drawn  randomly  from  a  probability  distribution,  and  that  this  distribution 
is  known  to  the  server. 

In  the  keystrokes  example,  it  is  reasonable  to  assume  that  both  the  client  and 
the  server  know  the  distribution,  since  both  have  seen  a  history  of  keystrokes 
made  by  the  client.  In  this  case,  the  client  and  the  server  can  agree  on  a  data 
compression  protocol  for  the  client  to  use  in  encoding  its  data.  For  example, 
suppose  that  the  client  uses  a  static  Huffman-coding  scheme  [7].  Then  the  data 
can  be  transferred  in  one  round  with  no  bits  sent  by  the  server  and  with  at 
most  Hi^D)  -f-  1  expected  bits  sent  by  the  client,  where  is  the  entropy  of 

the  distribution  D,  a  quantity  that  varies  between  0  and  n,  and  is  given  by  the 
equation 

where  Xi  is  an  n-bit  string  and  p,  is  the  probability  of  x,.  Notice  that  because 
both  the  client  and  the  server  know  the  distribution,  the  high-speed  downlink 
is  not  utilized  in  this  example. 

But  what  if  only  the  server  knows  the  distribution?  Throughout  this  pa¬ 
per,  we  assume  that  the  client  does  not  know  the  distribution.  Although  this 
seems  counterintuitive  at  first,  there  are  natural  situations  consistent  with  this 
assumption.  As  an  example,  suppose  that  the  client  has  filled  out  a  form  on  an 
internet  web  page,  and  is  to  send  its  response  to  a  server.  The  server,  which  has 
seen  many  replies,  may  know  the  distribution,  but  the  client  is  unlikely  to.  As 
another  example,  suppose  that  each  client  is  a  probe  that  is  designed  to  take  an 
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experimental  sample  and  report  it  back  to  the  server.  The  probe  will  not  have 
access  to  the  samples  taken  by  other  probes,  and  may  not  be  able  to  make  a 
long  transmission  back  to  the  server. 

1.2  Our  results 

In  general,  we  characterize  a  protocol  in  terms  of  four  parameters,  (tr,  A,y9), 
where  cr  is  the  expected  number  of  bits  sent  by  the  server,  (j)  is  the  expected 
number  of  bits  sent  by  the  client,  p  is  the  expected  number  of  rounds  (defined 
below),  and  A  is  the  expected  computational  effort  expended  by  the  server  (also 
defined  below).  These  parameters  are  functions  of  n,  the  number  of  bits  in  the 
data  item,  and  H{D),  the  entropy  of  the  distribution  D. 

A  round  of  the  protocol  is  defined  to  be  a  maximal  set  of  consecutive  bits 
sent  by  the  server  (without  any  bits  sent  in  between  by  the  client),  followed  by  a 
maximal  set  of  consecutive  bits  sent  by  the  client.  All  the  bits  sent  by  the  server 
(or  client)  in  a  round  can  be  transmitted  without  waiting  for  a  response  from 
the  client  (or  server,  respectively),  and  thus  minimizing  the  number  of  rounds 
required  by  a  protocol  is  an  important  consideration  in  many  scenarios.  For 
example,  this  is  the  case  when  the  time  required  for  a  round-trip  communication 
is  large  and  does  not  depend  on  the  number  of  bits  communicated. 

The  computational  effort  A  expended  by  the  server  is  quantified  as  follows. 
We  assume  that  the  server  has  access  to  the  distribution  on  the  string  held 
by  the  client  via  a  black  box  that  answers  queries  of  the  form  “What  is  the 
cumulative  probability  of  4-bit  data  items  matching  the  pattern  0*11?”  In  this 
example  n  is  4,  and  the  server  is  asking  for  the  sum  of  the  probabilities  of  the 
strings  0011  and  0111.  The  parameter  A  is  simply  the  number  of  such  queries 
made  by  the  server.  In  all  of  our  protocols,  the  additional  computation  time 
required  of  the  server  in  the  random-access  machine  (RAM)  model  [1]  is  at  most 
O (A  log  A).  The  computation  time  required  by  the  client  is  at  most  0(n). 

We  require  that  upon  the  termination  of  a  protocol,  the  server  knows  the 
n-bit  data  item  held  by  the  client  with  certainty,  i.e.,  there  is  no  probability  that 
the  server  incorrectly  identifies  the  data  item.  The  number  of  bits  transmitted 
by  both  parties,  the  number  of  rounds,  and  the  computational  effort  expended 
by  the  server,  however,  may  all  be  random  variables  taken  over  the  distribution 
of  data  items  and  over  random  choices  made  by  the  algorithms  underlying  the 
protocols. 

We  begin  by  describing  a  (3n,  1.71H{D)  -[- 1, 3n,  1.71H{D)  -(- 1)  protocol.  We 
call  this  protocol  Computationally- efficient  because  the  expected  number  of 
black-box  queries  performed  by  the  server  is  asymptotically  optimal. 

Next,  we  present  an  (0{n) ,  0{H (D)  -f  1),2”,0(1))  protocol.  This  protocol 
is  called  Round- efficient,  because  the  expected  number  of  rounds  required  is 
only  a  constant,  which  is  optimal. 

Our  third  protocol,  Computation-Rounds-TradeofF(c),  allows  us  to  achieve 
a  tradeoff  between  the  expected  number  of  black  box  queries  and  the  expected 
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number  of  rounds  required.  For  any  positive  integer  c  between  1  and  n,  Computation- 
Rounds-Tradeofr(c)  is  an  (0(n),  0(II(D)  +  1),  0(^))  protocol. 

It  is  worth  noting  that  the  actual  speeds  of  the  downlink  and  uplink  do 
not  appear  in  our  analyses  of  these  protocols.  Although  these  protocols  were 
motivated  by  networks  with  asymmetric  transfer  speeds,  in  fact  they  can  be 
applied  in  any  situation  in  which  it  is  desirable  to  reduce  the  number  of  bits 
transmitted  by  the  client,  provided  that  the  server  knows  the  distribution,  but 
the  client  does  not. 

We  complement  these  upper  bounds  by  proving  a  number  of  impossibility 
results  and  lower  bounds. 

We  begin  by  observing  that  all  three  of  our  protocols  are  asymptotically 
optimal  in  the  number  of  bits  sent  by  the  client,  0(E(D)).  Next,  we  show  that 
they  are  also  all  existentially  asymptotically  optimal  in  the  number  of  bits  sent 
by  the  client  and  the  server  together,  0(n).  By  existentially,  we  mean  that  for 
any  h,  there  are  distributions  with  entropy  ~  h  for  which  the  expected 

total  number  of  bits  sent  by  both  parties  must  be  Q(n). 

We  also  show  that  there  is  no  protocol  that  is  optimal  for  every  distribution, 
as  opposed  to  just  existentially  optimal,  in  terms  of  bits  sent  by  the  server. 

This  follows  from  a  proof  that  it  is  undecidable  to  compute,  for  an  arbitrary 
distribution  Z),  the  value  OPT{D)^  the  minimum  expected  total  number  of 
bits  that  the  server  and  client  must  exchange  in  order  to  solve  the  problem. 
Furthermore,  the  problem  remains  undecidable  even  if  only  an  approximate 
solution  is  required.  For  example,  computing  a  value  that  is  guaranteed  to  be 
between  the  inverse  of  Ackerman’s  function  applied  to  OPT[D)  and  Ackerman’s 
function  applied  to  OPT[D)  is  undecidable. 

We  conclude  by  showing  that  the  number  of  black-box  queries  performed 
by  the  server  in  protocol  Computation-efficient  is  asymptotically  optimal. 

In  particular,  we  show  that  for  any  entropy  h,  there  is  a  distribution  D  with 
entropy  H{D)  ^  hior  which  the  sum  of  the  number  of  bits  sent  by  the  client  plus 
the  number  of  black-box  queries  is  at  least  n.  We  also  show  that  for  any  single¬ 
round  protocol  in  which  the  client  sends  0{H{D))  bits,  there  are  distributions 
for  which  the  server  must  send  an  exponential  number  of  bits, 

1.3  Previous  work  and  related  work 

There  is  a  wealth  of  literature  on  two-party  communication  complexity.  Most  of 
this  work,  however,  examines  symmetric  communication  channels,  and  analyzes 
the  total  number  of  bits  transmitted  by  the  two  parties,  and  sometimes  the 
number  of  rounds.  A  good  reference  is  the  recent  book  by  Kushilevitz  and 
Nisan  [8]. 

There  is  relatively  little  work  on  asymmetric  communication  complexity. 

One  notable  exception  is  a  body  of  work  connecting  asymmetric  communication 
complexity  to  lower  bounds  on  the  time  to  perform  operations  on  various  data 
structures  [11,  12].  The  last  paper  is  most  closely  related  to  this  one.  It  presents 
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a  number  of  general  techniques  for  proving  tradeoffs  between  the  number  of  bits 
sent  by  the  server  and  the  number  of  bits  sent  by  the  client.  It  then  applies 
these  techniques  to  a  several  fundamental  problems.  As  an  example,  one  of  the 
problems  it  considers  is  the  membership  problem.  In  this  problem  the  server 
holds  a  set  S  of  strings,  and  the  client  holds  a  single  string  x.  The  goal  of  both 
parties  is  to  determine  if  x  belongs  to  S.  The  paper  also  examines  generalizations 
of  the  membership  problem  such  as  the  disjointness  problem.  In  this  problem 
the  server  and  client  hold  sets  S  and  C,  and  the  goal  is  to  determine  if  the 
sets  are  disjoint.  Other  problems  include  the  span  problem,  in  which  the  server 
holds  the  basis  of  a  vector  space,  the  client  holds  a  vector,  and  the  goal  is  to 
determine  of  the  client’s  vector  lies  in  the  server’s  space,  and  the  greater  than 
problem,  in  which  the  server  and  client  each  hold  an  integer,  and  the  goal  is  to 
determine  which  integer  is  larger. 

A  recent  study  has  shown  that  in  practice,  even  if  the  flow  of  data  is  entirely 
downstream,  the  overall  rate  at  which  data  can  be  transferred  in  asymmetric 
networks  may  be  limited  by  the  upload  speed  [2].  The  explanation  for  this  is 
that  in  the  TCP  protocol,  acknowledgments  must  be  sent  upstream  for  all  data 
that  travels  downstream,  and  the  flow  of  data  will  stall  if  the  acknowledgments 
cannot  keep  up. 

2  Upper  Bounds 

In  this  section  we  provide  three  protocols.  All  three  are  within  a  constant 
fraction  of  optimal  in  terms  of  the  number  of  bits  sent  by  the  client.  They 
are  also  all  existentially  optimal  in  the  number  of  bits  sent  by  the  server.  The 
first  is  asymptotically  optimal  in  terms  of  the  number  of  black  box  queries 
required,  the  second  is  asymptotically  optimal  in  terms  of  the  expected  number 
of  rounds  required,  and  the  third  allows  us  to  achieve  a  tradeoff  between  black 
box  queries  and  the  number  of  rounds  required.  In  the  following,  let  D  represent 
the  distribution  known  to  the  server,  and  let  D{x)  be  the  probability  assigned 
to  the  string  x  by  the  distribution  D. 

Protocol  Computation-efficient 

In  this  protocol,  the  server  sends  the  client  queries  consisting  of  candidate 
prefixes  for  the  client’s  string,  and  the  client  responds  positively  or  negatively 
to  these  queries.  The  server  keeps  track  of  the  responses,  and  this  information 
allows  the  server  to  remove  strings  from  consideration.  Future  queries  to  the 
client  depend  on  the  client’s  previous  responses.  In  order  to  do  this  efficiently, 
the  results  of  black  box  queries  are  adjusted  from  the  a  priori  probability  of 
a  string  occurring,  to  reflect  the  information  learned  from  the  client  thus  far. 
Given  a  set  of  excluded  strings  X,  and  pg,  the  result  of  a  black  box  query  Q, 
Pq  can  be  adjusted  to  reflect  that  the  actual  string  cannot  be  in  the  set  X  by 
first  subtracting  the  weight  of  all  strings  in  X  that  are  consistent  with  Q,  and 
then  dividing  the  result  by  the  weight  of  all  the  strings  not  in  X.  We  call  the 
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resulting  value  the  exclusion  adjusted  probability. 

The  protocol  is  defined  as  follows: 

Repeat  the  following  until  the  entire  string  is  known: 

•  Conditioning  on  all  information  learned  from  the  client  thus  far,  the  server 
finds  a  prefix  of  the  unknown  bits  as  follows: 

-  Let  s  be  the  empty  string. 

-  The  server  repeats  the  following  until  it  has  a  prefix  that  occurs  with 
probability  between  ^  and  |,  or  that  extends  to  the  end  of  the  string. 

*  Query  the  black  box  for  sO  occurring  starting  in  the  first  unknown 
bit  position. 

*  If  the  exclusion  adjusted  probability  of  the  value  returned  by  the 
black  box  is  >  |,  then  a  0  is  appended  to  s. 

*  If  the  exclusion  adjusted  probability  of  the  value  returned  by  the 
black  box  is  <  |,  then  a  1  is  appended  to  s. 

•  The  server  sends  this  prefix  to  the  client. 

•  If  the  prefix  matches  the  client’s  string  exactly,  the  client  responds  with  a 
“y” ;  otherwise  the  client  responds  with  an  “n” . 

Note  that  the  prefix  sent  always  either  extends  to  the  end  of  the  string,  or 
occurs  with  probability  between  |  and  | ,  since  when  a  prefix  that  occurs  with 
probability  p  >  |  is  extended  by  one  bit,  the  prefix  with  the  more  likely  of  the 
two  settings  for  that  bit  occurs  with  probability  at  least  | . 

Theorem  1  For  any  distribution  D,  protocol  Computation-efficient  ts  a 
(3n,I.71F(D)+  1,3/1, 

1)  protocol. 

Proof:  We  first  show  that  the  expected  number  of  bits  sent  by  the  client 
is  0{H{D)  +  1)  bits.  For  any  input  distribution  D,  we  model  the  bits  sent  by 
the  client  as  a  tree,  where  each  leaf  of  the  tree  represents  a  string  held  by  the 
client.  Each  left  branch  of  the  tree  represents  a  “y”  response  by  the  client  and 
each  right  branch  of  the  tree  represents  a  response  of  “n”.  In  this  tree,  the 
probability  of  the  protocol  reaching  any  leaf  Xi  is  exactly  D{xi). 

The  choice  of  prefix  that  the  server  sends  to  the  client  gives  us  the  following 
important  fact.  At  every  internal  node  of  the  tree,  the  right  branch  occurs 
with  probability  <  |,  and  the  left  branch  either  occurs  with  probability  <  |  or 
represents  an  affirmative  answer  to  a  prefix  that  extends  to  the  end  of  the  string 
(which  is  a  leaf  of  the  tree).  Thus,  along  any  path  from  the  root  to  a  leaf,  there 
is  at  most  one  branch  that  occurs  with  probability  >  |.  Therefore,  the  depth  of 
leaf  Xi  is  at  most  1  +  log2/3  D{xi).  This  implies  that  the  expected  number  of  bits 
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sent  by  the  client  is  at  most  +  ^^§2/3 =  1  +  ■^(■^)/ log(i)  ^ 

1+  L7m(D). 

The  bound  on  the  expected  number  of  rounds  follows  from  the  fact  that  the 
client  sends  one  bit  in  each  round.  To  see  that  the  expected  number  of  bits  sent 
by  the  server  is  at  most  3n,  let  Ei  be  the  a  priori  expected  number  of  possible 
matches  sent  by  the  server  for  the  bit  of  the  string  held  by  the  client.  For 
every  prefix  sent  by  the  server,  the  probability  of  a  successful  match  is  at  least 

Therefore,  Ei  <  3,  and  the  result  follows  from  the  linearity  of  expectation. 
The  bound  on  the  number  of  black  box  queries  follows  from  the  fact  that  each 
bit  sent  by  the  server  corresponds  to  a  single  black  box  query.  ■ 

The  next  protocol  uses  only  a  constant  expected  number  of  rounds,  but  at 
the  cost  of  a  larger  number  of  black  box  queries. 

Protocol  Round-efficient 

For  any  distribution  Z),  let  T{D)  represent  the  strings  in  sorted  order  from 
most  likely  to  occur  to  least  likely  to  occur.  Let  r{D)  represent  a  partition  of 
the  strings  into  sets  Tj.  Set  contains  the  first  hi  strings  of  T(Z)),  where  hi  is 
chosen  so  that  hi  >  0  and  ll  minimized.  In  other  words,  set 

A’l  contains  as  close  to  half  the  probability  weight  as  possible.  Set  X2  contains 
the  next  /?2  strings,  where  h2  is  chosen  so  that  X2  contains  as  close  to  half  the 
remaining  probability  weight  as  possible,  and  similarly  with  the  remainder  of 
the  sets  in  the  partition.  Note  that  the  last  set  in  the  partition  (denoted  Mr) 
contains  exactly  1  string.  Also  note  that  each  set  Mj  either  contains  only  one 
string,  or  contains  between  |  and  |  of  the  remaining  probability  weight. 

We  can  compare  the  partition  of  the  strings  into  the  sets  Mi  with  the  con¬ 
struction  of  a  Fano  code  [5]  (see  also  [4]).  To  construct  a  Fano  code,  the  strings 
are  likewise  sorted  in  order  of  probability,  and  then  divided  into  as  close  to  two 
equally  probable  sets  as  possible.  The  first  bit  of  the  codeword  is  assigned  to 
a  1  if  the  string  lies  in  the  first  set,  and  a  0  if  the  string  lies  in  the  second 
set.  However,  for  a  Fano  code,  this  same  process  is  repeated  on  both  sets  as 
many  times  as  is  possible.  In  our  construction,  we  only  subdivide  the  set  which 
contains  the  less  likely  strings.  Instead  of  subdividing  the  other  set,  we  reduce 
the  number  of  rounds  required  by  using  hashing  to  differentiate  between  strings 
in  that  set.  The  difficult  portion  of  this  technique  is  to  demonstrate  that  the 
client  can  use  hashing  in  a  manner  that  does  not  increase  the  number  of  bits 
sent  by  more  than  a  constant  factor. 

We  here  use  En,  the  family  of  pairwise  independent  hash  functions  where 
for  each  F’  E  we  have  F{x)  =  ax  b,  where  arithmetic  is  with  respect 
to  the  finite  field  GF[2^]  [3].  Here,  a  and  b  are  integers  chosen  uniformly 
and  independently  at  random  from  the  range  [0 . .  .2”  —  1],  and  thus  the  total 
number  of  bits  required  to  describe  any  F  E  Fn  is  2n.  Also,  note  that  with 
this  construction,  for  any  k  <  n,  the  first  k  bits  of  F{x)  also  forms  a  pairwise 
independent  hash  function  (see  for  example  [10]). 

•  The  server  queries  the  black  box  to  find  D{xi)  for  all  possible  strings  Xi, 
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and  uses  this  information  to  determine  the  partition  r{D).  To  do  this,  the 
server  sorts  the  strings  based  on  D{xi). 

•  The  server  sends  to  the  client  a  randomly  chosen  hash  function  F  e 

•  Let  i  =  I  and  lei  h  =  0. 

•  Repeat  the  following  until  x,  the  client’s  string,  is  known  by  the  server. 

—  The  server  sends  to  the  client  the  binary  representation  of  h'  ~ 
\\og  hi]. 

~  If  /?.'  >  ft,  the  client  sends  to  the  server  bits  ft  +  1  through  ft'  of  F(x). 
Note  that  this  is  sufficient  for  the  server  to  know  the  first  ft'  bits  of 
F{x). 

—  ft  —  max(ft,  ft'). 

-  The  server  finds  all  strings  x^  E  T;-  such  that  the  first  ft  bits  of  F{x^) 
are  the  same  as  the  first  ft  bits  of  F{x),  and  sends  the  strings  to  the 
client. 

—  If  the  client  sees  its  string  in  the  list  sent  by  the  server,  the  client 
sends  a  “y”j  followed  by  the  index  of  its  string  within  the  list,  and 
the  protocol  terminates. 

*  Otherwise,  the  client  sends  the  server  an  “n”. 

—  If  i  =  r—1,  then  there  is  only  one  possible  string  remaining,  and  the 
protocol  terminates. 

+  Otherwise  z  z  -f  1. 

Theorem  2  Protocol  Round-efficient  is  an  {0(n),0{H{D)  T  1),2^,0(1)) 
protocol. 

Proof:  We  first  bound  the  expected  number  of  bits  sent  by  the  client.  We 
do  this  as  follows:  we  introduce  a  code  r,  called  the  comparison  code  for  the 
distribution  D,  and  show  that  the  expected  codeword  length  using  f  is  0(1  + 
ftr(O)).  We  then  show  that  the  expected  number  of  bits  sent  by  the  client  is  at 
most  a  constant  factor  more  than  the  expected  codeword  length  of  r. 

We  describe  the  code  f  as  a  tree.  In  this  tree,  every  left  branch  represents  the 
transmission  of  a  “y” ,  every  right  branch  represents  the  transmission  of  an  “n” , 
and  every  leaf  represents  a  string.  The  subtree  found  by  starting  at  the  root, 
taking  0  <  ft  <  r  —  2  right  branches,  followed  by  a  single  left  branch,  contains 
exactly  the  strings  in  This  portion  of  the  code  f  is  identical  to  the  bits 

sent  by  the  client.  Within  each  subtree,  we  use  any  code  with  the  following 
property:  at  any  internal  node  of  the  tree,  either  the  probability  of  taking  the 
left  branch  is  between  |  and  |  (we  call  such  a  node  a  balanced  node),  or  the 
branch  with  higher  probability  is  a  leaf  of  the  tree.  Examples  of  such  codes  are 
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those  defined  by  the  bits  sent  by  the  client  in  protocol  Computation-efficient, 
and  Fano  codes  [5]. 

Let  E{f)  be  the  expected  codeword  length  using  the  code  f  on  a  string 
Xi  drawn  from  the  distribution  D.  Using  an  argument  similar  to  the  proof  of 
Theorem  1,  we  show  that  E[f)  =  0(l  +  i/(I>)).  As  before,  along  any  path  from 
the  root  to  a  leaf,  there  is  at  most  one  node  that  is  not  balanced.  Therefore, 
the  deptli  of  leaf  xi  is  at  most  1  -j-  log2/3  This  implies  that  E[f)  is  at 

most  X:,,.,e(x,)(l  +  log2/3e(x'i))  =  !  +  /?(£>)/ log(|)«  l  +  1.7m(e). 

We  next  show  that  E{A)^  the  expected  number  of  bits  sent  by  the  client  on 
the  distribution  D,  is  0[E{f)).  We  first  derive  a  lower  bound  for  E{f),  We 
assume  that  there  is  more  than  1  string  Xi  such  that  D[xi)  >  0,  since  when  this 
is  not  the  case,  the  number  of  bits  sent  by  the  client  can  easily  be  seen  to  be 
0(1).  We  derive  an  expression  for  the  minimum  depth  of  any  string  in  Xj  in  r. 
The  depth  of  the  string  in  Xj  is  at  least  1;  this  suffices  for  the  case  where  }\j  —  1. 
\Mien  hj  >  1,  let  Xm  be  a  minimum-depth  leaf  in  Xj  such  that  if  ,  the  other 
child  of  the  parent  of  Xm,  is  also  a  leaf,  then  Let  Vj  be  the 

root  of  the  subtree  defined  by  Xj.  Since  there  are  no  leaves  at  a  smaller  depth 
than  X,,,.  all  the  nodes  on  the  path  from  Vj  to  Xr,^,  with  the  possible  exception 
of  the  last,  node,  are  balanced.  Either  the  last  node  is  balanced,  or  the  brancli 
taken  from  that  node  to  reach  Xm  occurs  with  probability  >  |.  Thus,  every 
branch  on  the  path  from  Vj  to  Xm  occurs  with  probability  > 

Let  qj  —  ^  D{xi),  the  probability  of  reaching  ?;/.  The  length  of  the 

path  from  rj  to  x,,,  is  at  least  logs  DixZ)'  D{xi)  be  the 

maximum  probability  of  any  string  that  appears  in  Xj.  Since  iVj  >  we 

see  that 

r 

E[f)  >  ^  qj  max(l,  logg  ^). 

j='-  ^ 

We  next  bound  the  expected  number  of  bits  sent  by  the  client.  The  client 
sends  three  kinds  of  bits:  bits  that  represent  the  image  of  a  hash  function, 
bits  that  represent  a  “y”  or  “n”  answer  to  a  list  of  strings  sent  by  the  server, 
and,  after  a  "y"  answer,  bits  that  represent  the  index  of  the  correct  string  within 
that  list.  The  index  is  only  sent  once,  and,  since  we  have  a  pairwise  independent 
hash  function,  the  expected  number  of  strings  in  the  list  is  <  2.  Thus,  the  total 
expected  number  of  index  bits  is  1.  When  the  client  finds  out  that  the  string  it 
holds  is  not  in  any  of  the  first  j  -  1  sets  A’l  . . .  Aj_i,  it  may  need  to  transmit 
some  additional  bits  of  the  hash  function  image,  but  never  more  than  [log/^j] 
additional  bits.  This  occurs  with  probability  6/  —  1  —  Yl)=\  where  we  define 
Si  =  1.  Thus,  the  expected  number  of  bits  sent  by  the  client  is  at  most 

r-l 

Ei^)  =  ^Si(log/)-  +  0{l)). 

J=1 

Here,  the  0(1)  term  accounts  for  the  index  bits,  the  “y"  or  •’n"  bits,  as  wei) 
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as  the  rounding  of  log  hi .  In  order  to  compare  this  expression  with  that  derived 
for  ^(r),  we  use  the  following  facts: 

1-  ^ 

2.  When  hi  >  1, 

3.  When  hi  >  1,  hi  < 

Fact  ]  follows  directly  from  the  fact  that  in  constructing  the  set  Xj ,  we  used 
as  close  to  half  the  remaining  probability  weight  as  possible.  When  hi  >  1, 
we  see  that  since  the  strings  are  partitioned  in  order  from  most  weight  to  least 
weight,  qi  <  gs,.  Thus,  Sj+i  >  which  combined  with  Fact  1,  gives  us  Fact  2. 
To  prove  Fact  3,  note  that  since  every  string  in  the  set  T,  occurs  with  greater 
probability  than  any  string  in  the  set  Ti+i,  we  have  m,+i  <  Since  g,-  <  |sj 
combined  with  Fact  2  imply  that  qi  <  &qi+i,  Fact  3  follows. 

To  apply  these  facts  to  E(A),  there  are  two  cases  for  each  term  of  the  sum¬ 
mation.  When  hi  =  1,  then  Fact  1  implies  that  Si(loghi  +  0(1))  -  0(qi). 
In  the  case  that  />,-  >  1,  Facts  2  and  3  give  us  that  Si(loghi  +  0(1))  < 
9?i+i(log^7^  +  0(1)).  Combining  both  cases,  we  see  that 

s,(log  hi  -f  0(1))  <  99,+i(log  -I-  0(1))  -I-  0(qi). 

TTli^l 

This  implies  that 

r 

£’(-4)  =  ^0(ft(log-^-H)). 

i=i 

This  implies  that  ^(.d)  =  0(£:(f)),  which  in  turn  implies  that  £'(>1)  =  0{H{D)+ 

The  expected  number  of  rounds  required  by  this  protocol  is  6,  which  follows 
from  the  fact  that  to  process  each  set  Xi,  only  2  rounds  are  required.  Condi¬ 
tioned  on  the  fact  that  no  previous  set  has  contained  the  string  held  by  the 
client,  each  set  contains  this  string  with  probability  at  least  i,  and  thus  the 
expected  number  of  sets  Xi  that  need  to  be  processed  is  3. 

The  server  sends  three  kinds  of  bits  to  the  client:  bits  that  represent  the 
number  [loghj],  bits  that  describe  the  hash  function  to  be  used,  and  bits  that 
represent  strings  that  map  to  the  same  image  of  the  hash  function  as  x,  the 
string  held  by  the  client.  For  any  set  Hj,  the  number  of  bits  needed  to  represent 
[log/ijl  is  log  log  hj  -I-  o(loghj  )  <  logn  +  o(logn).  The  number  of  bits  needed 
to  describe  the  hash  function  is  2n.  Since  we  have  a  pairwise  independent  hash 
function,  for  each  set  Xi  that  is  examined  the  expected  number  of  strings  that 
map  to  the  same  image  as  Xi,  not  counting  Xi  itself,  is  <  1.  The  expected 
number  of  sets  Xi  examined  is  3,  and  thus  the  expected  total  number  of  bits 
representing  strings  other  than  the  string  a;  is  3n.  In  addition,  the  string  x  is 
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sent  when  processing  the  last  set.  Thus,  the  total  expected  number  of  bits  sent 
by  the  server  is  6n  +  o(n).  ■ 

We  also  point  out  that  although  the  constants  provided  by  this  proof  are 
larger  than  the  constants  we  provide  for  protocol  Computation-efficient,  in 
the  case  that  for  all  xi,  D[xi)  is  an  inverse  power  of  2,  protocol  Round-efficient 
can  be  made  into  a  (|n,  +  2^,3)  protocol.  Furthermore,  if  a  shared 

source  of  randomness  is  allowed  (i.e.,  if  the  hash  function  is  chosen  beforehand), 
then  this  can  be  further  improved  to  a  (|n,  2H{D)  -{-  0(1),  2^,  3)  protocol. 

Neither  of  the  previous  protocols  is  optimal  in  terms  of  both  computation 
and  the  number  of  rounds  required.  We  next  show  that  we  can  smoothly  trade 
off  between  the  number  of  black  box  queries  required  and  the  number  of  rounds 
required. 

Protocol  Computation- Rounds-TradeofF(c) 

For  c  a  positive  integer  between  1  and  n,  repeat  the  following  until  the  entire 
string  is  known: 

♦  Conditioning  on  all  information  known  thus  far,  the  server  finds  a  prefix 
of  the  unknown  bits  that  either  occurs  with  probability  between  |  and  | , 
or,  if  that  is  not  possible,  extends  to  the  end  of  the  string. 

♦  If  the  length  of  this  prefix  is  <  c  and  the  prefix  does  not  extend  to  the 
end  of  the  string,  then  protocol  Round-efficient  is  used  to  determine  the 
next  c  unknown  bits,  where  probabilities  are  conditioned  on  the  value  of 
the  bits  determined  so  far. 

♦  Otherwise,  the  server  sends  the  prefix  to  the  client. 

-  If  the  prefix  matches  the  client’s  string  exactly,  the  client  responds 
with  a  “y” ;  otherwise  the  client  responds  with  an  “n” . 

Theorem  3  Protocol  Computation-Rounds-TradeofF(c)  is  a  (0(n),  0(i7(D)+ 
1).C(^),0(|))  protocol. 

Proof:  We  first  show  that  the  expected  number  of  bits  sent  by  the  client  is 
0{H{D)  -h  1).  The  possible  “y”  and  “n”  bits  sent  by  the  client  define  a  tree 
ly,  as  before.  We  compare  the  expected  codeword  length  of  i/  to  the  expected 
codeword  length  of  a  related  code  i>  for  the  distribution  D.  In  order  to  define  i>, 
we  first  need  to  define  some  notation.  For  a  given  distribution  D,  let  ki^ . .  .kz 
be  some  canonical  ordering  of  all  possible  calls  to  Round- efficient  that  can  be 
made  over  all  possible  strings  held  by  the  client.  In  ki,  there  is  some  distribution 
Di  on  the  c  bits  to  be  determined,  where  Di  depends  on  D,  and  on  what 
information  about  the  string  held  by  the  client  has  been  determined  by  the 
server  prior  to  the  call  ki.  Let  be  the  subset  of  the  nodes  of  u  that  can  be 
reached  during  call  ki  on  some  string  held  by  the  client. 

Let  fi  be  the  comparison  code  (as  defined  in  the  proof  of  Theorem  2)  for 
the  distribution  A'.  The  code  P  is  produced  by  starting  with  the  code  ly,  and 
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replacing  each  set  of  nodes  r^-  with  the  comparison  code  fy .  The  nodes  of  v 
that  are  descendents  of  the  leaf  of  ti  representing  the  c  bit  string  Xj  become 
descendents  of  the  leaf  in  fi  that  also  represents  xj.  We  saw  in  the  proof  of 
Theorem  2  that  the  expected  height  of  any  tree  ry  is  at  most  a  constant  factor 
larger  than  that  of  the  corresponding  tree  fi,  and  thus  the  expected  codeword 
length  of  the  code  i/  is  at  most  a  constant  factor  larger  than  the  expected 
codeword  length  of 

WV  next  show  that  the  expected  number  of  bits  used  in  the  code  is 
0{H{D)  -h  1).  In  the  proof  of  Theorem  L  we  saw  that  if  there  was  at  most 
one  unbalanced  node  in  the  path  from  the  root  to  any  leaf  in  the  tree  represent¬ 
ing  a  code,  then  the  expected  codeword  length  of  that  code  is  0{H{D)  +  1). 
The  proof  here  is  complicated  by  the  fact  that  a  path  may  pass  through  one 
unbalanced  node  for  each  set  of  nodes  fy  that  it  passes  through. 

However,  we  only  make  a  call  to  Round-efficient  if  we  have  found  a  prefix 
of  the  c  bits  in  question  that  occurs  with  probability  at  most  This  implies  that 
given  that  we  enter  ry,  the  maximum  likelihood  leaf  of  fy  occurs  with  probability 
<  This  means  that  for  all  i,  the  root  node  of  fi  is  balanced.  This  in  turn 
implies  that  on  any  path  of  length  /  from  the  root  of  />  to  a  leaf  of  there  can 
never  be  two  consecutive  unbalanced  nodes,  with  the  possible  exception  of  the 
last  two  nodes.  Thus,  for  a  path  of  length  /,  the  number  of  unbalanced  nodes 
is  at  most  +  1.  The  number  of  balanced  nodes  on  any  path  from  the  root 
to  a  leaf  Xj  is  at  most  logo/a /9(a?y),  and  thus  the  length  of  the  path  to  Xi  is 
0(log  )  4-1).  It  follows  that  the  expected  number  of  bits  used  in  the  code 
i>is  0{H{D)-fl). 

To  see  that  the  expected  number  of  bits  sent  by  the  server  is  0(n),  it  is  easy 
to  bound  the  expected  numl)er  of  bits  sent  by  calls  to  Round-efficient,  and 
by  the  remainder  of  the  protocol  separately,  using  the  techniques  developed  in 
the  proofs  of  Theorems  2  and  1  respectively.  Specifically,  the  expected  number 
of  bits  transmitted  by  the  server  when  using  Computation-efficient  is  0(1) 
for  each  bit  of  the  string  held  by  the  client,  for  a  total  of  0(??).  Also,  for  each 
use  of  Round-efficient,  the  expected  number  of  bits  sent  by  the  server  is  0(c), 
and  there  can  be  at  most  ^  uses  of  this  protocol. 

The  bound  on  the  number  of  black  box  queries  follows  from  the  fact  that  the 
expected  number  of  black  box  queries  used  to  determine  prefixes  of  the  string 
is  at  most  0{7i),  and  the  expected  number  of  black  box  queries  used  for  each  of 
at  most  ^  calls  to  Round-efficient  is  at  most  2^.  Since  —  >  n,  the  number 
of  black  box  queries  is  O(^), 

The  bound  on  the  number  of  rounds  required  follows  from  the  fact  that  the 
total  expected  number  of  rounds  required  for  all  calls  to  Round-efficient  is  at 
most  0{n/c).  Since  each  of  the  prefixes  either  has  length  at  least  c  or  extends  to 
the  end  of  the  string,  and  each  one  is  a  success  with  probability  at  least  h.  the 
expected  number  of  prefixes  sent  is  also  at  most  0(  ii/c).  Note  that  the  number 
of  prefixes  extending  to  the  end  of  the  string  is  0(1),  since  each  one  is  a  success 
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with  probability  >  | .  ■ 

3  Lower  Bounds  on  Bits  Sent  by  The  Client 
and  The  Server 

Theorem  4,  below,  implies  that  our  protocols,  for  all  of  which  the  expected 
number  of  bits  sent  by  the  client  is  0{H{D)),  are  optimal  in  terms  of  this 
measure. 

Theorem  4  (Shannon  [13])  For  any  distribution  D,  the  expected  number  of 
bits  sent  by  the  client  is  at  least  H{D),  where  H{D)  is  Shannon’s  entropy  of  the 
distribution  D. 

Shannon’s  lower  bound  holds  even  if  both  the  client  and  the  server  know 
the  distribution.  In  our  scenario,  only  the  server  knows  the  distribution,  and 
this  can  only  increase  the  number  of  bits  required.  In  the  lower  bounds  proved 
from  this  point  forward,  it  will  be  crucial  that  the  client  does  not  know  the 
distribution. 

We  next  prove  a  lower  bound  on  the  number  of  bits  that  need  to  be  sent  by 
the  server.  To  do  this,  we  show  that  when  the  distribution  is  chosen  from  a  broad 
class  of  distributions,  then  the  expected  total  number  of  bits  that  need  to  be 
sent  is  at  least  n.  This  demonstrates  that  all  of  our  algorithms  are  existentially 
optimal,  in  terms  of  the  number  of  bits  sent  by  the  server,  for  any  protocol 
where  the  client  sends  0{H[D))  bits.  We  demonstrate  in  Section  3.1  that  there 
are  distributions  where  the  total  number  of  bits  sent  can  actually  be  much  less 
than  n.  However,  we  also  demonstrate  that  designing  a  protocol  that  uses  close 
to  the  minimum  total  number  of  bits  for  all  distributions  is  not  possible. 

Definition  1  A  distribution  D  over  strings  {0,1}"^  is  onto,  if  for  any  string 
Xi  e  {0, 1}”,  D{xi)  >  0. 

Definition  2  A  multiset  of  distributions  is  symmetric  if  the  distributions  in 
the  set  can  be  partitioned  into  subsets  such  that  within  each  subset,  (1)  each 
distribution  appears  only  once,  and  (2)  for  any  distribution  Di  in  the  subset 
and  for  any  permutation  ir  of  the  2^  strings  Xi,  there  is  a  distribution  D2  in  the 
same  subset,  such  that  for  all  Xi,  Di{xi)  =  D2{7r{xi)).  We  call  this  partition 
the  balancing  partition. 

Intuitively,  a  multi-set  of  distributions  is  symmetric  if  no  preference  is  given 
to  any  specific  string. 

Theorem  5  For  any  protocol  and  for  any  distribution  chosen  uniformly  at  ran¬ 
dom  from  any  set  of  onto  distributions  that  is  symmetric,  the  expected  total 
number  of  bits  sent  by  the  client  and  the  server  is  at  least  n  —  1. 
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Proof:  We  prove  the  Theorem  for  the  case  where  the  balancing  partition 
consists  of  a  single  subset.  When  the  balancing  partition  consists  of  many 
subsets,  choosing  a  distribution  is  equivalent  to  first  choosing  a  subset  in  the 
balancing  partition,  and  then  choosing  a  distribution  from  within  that  subset. 
Thus,  the  result  for  a  set  with  only  a  single  subset  in  the  balancing  partition 
implies  the  more  general  result  stated  in  the  Theorem. 

Any  set  with  only  a  single  subset  in  the  balancing  partition  can  be  described 
by  a  sequence  of  pairs  PN  =  (pi,  Ai),  (ps,  N2), (pk,  Nk),  where  p,-  >  pj  for 
i  <  j  and  where  for  each  distribution  in  the  set,  there  are  exactly  Ni  strings  that 
occur  with  probability  pi .  Let  xi  be  the  string  given  initially  to  the  client,  and  let 
Dm  be  the  distribution  given  initially  to  the  server.  Here,  I  represents  the  index 
of  the  string  held  by  the  client,  and  m  represents  the  index  of  the  distribution 
held  by  the  server.  Let  I  be  the  unique  value  such  that  Pi  =  Dm(xi).  We  here 
prove  a  lower  bound  for  the  problem  where  both  the  client  and  the  server  know 
the  sequence  PN  a,t  the  start  of  the  protocol,  and  the  server  must  determine 
not  only  the  string  xi ,  but  also  the  value  I.  Since  this  problem  requires  no  more 
communication  than  the  original  problem,  a  lower  bound  for  this  problem  also 
applies  to  the  original  problem. 

Let  Ei  be  the  expected  total  number  of  bits  sent  by  the  client  and  the  server, 
conditioning  on  7  =  i,  where  this  expectation  is  taken  over  both  the  random 
choice  of  Dm.  and  the  choice  of  xi  using  the  chosen  distribution  Dm-  'Ng  prove 
that  for  any  fixed  i,  1  <  i  <  k,  Ei  >  n-1.  Since  the  actual  value  of  I  computed 
by  the  server  is  a  distribution  over  1  <  i  <  A*,  this  suffices  to  prove  the  Theorem. 
Note  that  for  every  Dm,  the  client  must  send  a  different  set  of  bits  to  the  server 
for  every  string  x/.  Also,  all  of  the  Ni  strings  where  I  =  i  occur  with  equal 
probability,  and  thus  when  Ni  >  ^ ,  Ei  >  n  —  1. 

To  show  that  Ei  >  n  —  1  when  Ni  <  ^  we  view  the  communication  task 
of  the  server  determining  7  as  a  communication  matrix,  where  the  rows  of  the 
matrix  represent  the  input  given  to  the  client  (i.e.,  there  are  2”  rows),  and  the 
columns  represent  the  distribution  given  to  the  server  (i.e.,  there  are  ) 

columns).  The  entry  of  the  matrix  in  row  I  and  column  m  contains  the  value  W 
7  when  the  server  starts  with  distribution  Dm  and  the  client  starts  with  string 
xi.  We  say  that  an  input  to  such  a  problem  is  the  pair  (/,  m). 

We  use  a  technique  based  on  the  idea  of  monochromatic  rectangles,  a  com¬ 
mon  technique  in  communication  complexity  developed  in  [14].  This  technique 
uses  the  fact  that  any  communication  protocol  partitions  the  communication 
matrix  into  rectangles,  each  consisting  of  the  matrix  entries  in  the  intersection 
of  a  specific  subset  of  the  rows  of  the  communication  matrix  with  a  specific  sub¬ 
set  of  the  columns  of  the  matrix.  The  transcript  of  bits  communicated  by  the 
client  and  the  server  is  the  same  for  any  two  inputs  that  represent  two  matrix 
entries  within  the  same  rectangle,  and  different  for  any  two  inputs  that  repre¬ 
sent  matrix  entries  in  different  rectangles.  For  a  proof  of  this,  see  for  example 
[8]. 
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Several  classical  results  in  communication  complexity  show  that  there  must 
be  a  large  number  of  rectangles  by  showing  that  each  rectangle  must  be  monochro¬ 
matic:  all  entries  in  the  rectangle  must  contain  the  same  value.  This  is  required 
since  typically  both  the  client  and  the  server  are  required  to  know  the  result  at 
the  end  of  the  protocol.  However,  in  the  problem  we  consider,  the  color  of  a 
matrix  entry  is  the  I  value  in  that  entry,  and  only  the  server  is  required  to  know 
I  at  the  end  of  the  protocol.  Thus,  for  the  problem  discussed  here,  within  each 
rectangle,  every  column  must  be  monochromatic.  We  call  such  a  rectangle 
column  monochromatic.  Note  that  only  rectangles  associated  with  distributions 
that  are  onto  are  required  to  be  column  monochromatic.  If  there  were  an  input 
pair  that  is  guaranteed  to  not  occur,  the  server  is  not  required  to  differentiate 
this  input  from  one  that  does  occur,  and  thus  the  input  that  does  not  occur 
could  be  in  the  same  column  of  the  same  rectangle  as  an  input  that  does  occur. 

We  provide  an  upper  bound  on  the  number  of  times  the  value  i  can  appear 
in  a  single  rectangle.  Consider  any  rectangle  that  consists  of  r  rows.  Let  Ci 
be  the  set  of  columns  of  that  rectangle  that  contain  the  value  2,  and  let  the  r 
rows  be  denoted  by  R.  For  this  rectangle  to  be  column  monochromatic,  for  each 
string  xi  that  appears  in  and  for  each  Dm  E  Ci,  —  pi.  This  means 

that  r  <  Ni.  By  counting  the  number  of  ways  that  the  probability  of  the  strings 

xi  not  in  R  can  be  set,  this  also  implies  that  \Ci\  <  •  Thus,  the 

maximum  number  of  times  that  i  can  appear  in  a  single  rectangle  is 

When  Ni  <  ■^,  this  is  maximized  when  r  —  1.  This  implies  that  the 
number  of  i’s  in  a  single  rectangle  is  at  most  ^  rectangle 

containing  the  value  i,  let  qz  be  the  probability  that  the  input  pair  {xi,Dm)  is 
in  rectangle  0,  conditioned  on  1  =  i.  Since  every  input  pair  that  results  in  I  —  i 
occurs  with  the  same  probability,  when  <  ^,  ^2  <  ^,  Vz.  The  value  ^  is 
obtained  by  dividing  the  maximum  number  of  times  that  i  can  appear  in  any 
one  rectangle  of  the  communication  matrix  by  the  total  number  of  times  that 
it  appears. 

Let  £i  be  the  entropy  of  the  transcript  of  bits  sent  by  the  client  and  the 
server.  The  fact  that  the  transcript  of  bits  sent  by  the  client  and  the  server  is 
different  for  every  rectangle  together  with  the  upper  bound  on  qz  imply  that  for 
any  i  such  that  Ni  <  Si  >  n.  This  in  turn  implies  that  Ei  >  n.  For  any  i 
such  that  Ni  >  Ei  >  n  —  1,  and  thus  the  a  priori  expected  number  of  bits 
communicated  is  at  least  n  —  1.  ■ 

3.1  Minimizing  the  Number  of  Bits  Sent 

Theorem  5  shows  that  the  protocols  we  have  presented  are  existentially  optimal. 
That  is,  for  many  natural  sets  of  distributions  given  to  the  server,  the  protocols 
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are  in  fact  optimal.  The  whole  picture,  on  the  other  hand,  is  more  involved. 
Consider  for  example  the  following  distribution,  described  by  giving  a  tech¬ 
nique  of  choosing  strings  from  the  distribution.  The  last  logn  bits  of  the  n  bit 
string  Xi  are  chosen  by  independent  flips  of  a  fair  coin.  Let  t  be  the  value  of  the 
binary  number  represented  by  these  bits.  When  f  <  n  -  logn,  the  remainder  of 
the  bits  in  the  string  are  set  to  0,  except  the  bit,  which  is  set  to  1.  When 
t  >  n  ~  logn  all  the  remaining  bits  are  set  to  0. 

For  the  distribution  Ds ,  the  entire  string  is  determined  by  the  last  log  n  bits. 
However,  protocol  Computation-efficient  sends  possible  matches  for  prefixes 
of  the  string,  and  in  order  to  find  a  prefix  that  occurs  with  probability  between 
I  and  I ,  the  server  has  to  send  a  prefix  to  the  client  that  has  length  [|].  For 
distribution  Dg ,  we  would  be  much  better  off  with  a  protocol  that  specifies  an 
order  on  the  bits  that  the  server  is  trying  to  match.  This  could  be  done  for 
Dg  by  having  the  server  send  log^  n  bits  indicating  which  log  n  bit  positions  to 
consider  first. 

Unfortunately,  it  is  not  sufficient  for  the  server  to  simply  specify  the  order  in 
which  the  bits  should  be  examined.  Consider  for  example  the  distribution  D^/, 
where  the  string  consists  of  n  —  1  Os  and  a  single  1,  placed  uniformly  at  random. 
Here,  regardless  of  what  order  the  server  attempts  to  match  bits  held  by  the 
client,  the  number  of  bits  that  must  be  matched  in  order  to  find  a  successful 
match  with  probability  between  |  and  |,  is  However,  this  distribution  has  a 
very  short  description  (O(logn)  bits)  that  could  be  sent  to  the  client.  Once  the 
client  knows  what  the  distribution  is,  it  can  simply  send  the  server  logn  bits 
describing  where  the  1  is  in  her  string. 

Let  OPT{D)  represent  the  minimum  expected  total  number  of  bits  sent  when 
the  client  has  a  sample  drawn  from  the  distribution  D,  which  is  known  only  to 
the  server.  The  distributions  Dg  and  Dg/  serve  to  demonstrate  that  distributions 
do  exist  where  none  of  the  protocols  we  have  presented  are  guaranteed  to  use 
the  optimal  total  number  of  bits,  or  even  within  a  constant  factor  of  the  optimal 
number  of  bits.  However,  we  next  show  that  it  is  not  possible  for  any  protocol  to 
use  OPT{D)  bits  for  every  distribution  D.  In  fact,  we  show  that  any  function 
that  provides  a  non-trivial  approximation  to  OPT[D)  for  every  distribution 
D  is  not  even  recursive!  Thus,  although  our  protocols  are  guaranteed  to  be 
optimal  only  for  broad  classes  of  distributions  and  not  for  all  distributions,  no 
other  protocol  could  guarantee  to  be  optimal  for  all  distributions. 

The  proof  of  the  following  Theorem  uses  Kolmogorov  complexity,  and  is  in 
fact  motivated  by  Kolmogorov’s  proof  (see  for  example  [9])  that  the  Kolmogorov 
complexity  of  a  string  is  not  a  recursive  function.  Recall  that  the  Kolmogorov 
complexity  of  a  string  which  we  here  denote  IC[x)^  is  the  minimum  description 
length  of  the  string  x.  We  here  use  the  definition  that  IC{x)  is  the  size  of  the 
smallest  description  of  a  Turing  machine  with  a  work  tape  but  no  input  tape 
that  can  produce  the  string  x  on  an  output  tape. 

Theorem  6  Let  f{D)  he  any  function  from  distributions  D  to  ^  such  that 
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f{D)  <  OPT{D)  and  as  OPT{D)  oo,  it  is  also  the  case  that  f{D)  — )•  oo. 
Such  a  function  f{D)  is  not  recursive. 

Proof:  Let  f{D)  be  any  such  function.  We  assume  that  f{D)  is  recursive 
and  reach  a  contradiction.  We  assign  a  distribution  Di  for  each  natural  number 
i,  where  A*  is  the  distribution  over  n  ^  [log  i  [-bit  strings  where  the  string  cor¬ 
responding  to  the  binary  representation  of  j  =  i  — occurs  with  probability 
1  ~  and  all  other  strings  occur  with  equal  probability. 

Using  these  distributions  A,  and  the  function  /,  we  define  a  new  function, 
F(m),  defined  for  any  natural  number  m.  F{m)  is  the  smallest  i  such  that 
/(A)  >  Let  Bn  be  the  set  of  all  T’  distributions  A  over  n-bit  strings. 
The  set  Bn  is  symmetric,  and  each  distribution  A  is  onto,  and  thus  Theorem 
5  implies  that  some  distribution  A  G  Bn  is  such  that  OPT(A)  >  n  —  1.  This 
means  that  the  value  of  OPT(A)  takes  on  arbitrarily  large  values  as  i  increases. 
Since  /(A)  increases  without  bound  as  (3PT(A)  increases,  we  see  that  F{m) 
is  well  defined  for  each  natural  number  m. 

Claim  1  JC{F{m))  >m  —  c,  for  some  constant  c. 

Proof  (of  Claim):  By  our  construction,  OPT(DF(^ni))  ^  so  it  suffices  to 
show  that  for  all  )C{i)  >  OPT{Di)  —  c  for  some  constant  c.  This  follows 
from  the  fact  that  OPT(A)  can  be  at  most  an  additive  constant  larger  than 
the  expected  number  of  bits  used  in  the  following  protocol:  if  the  distribution 
received  by  the  server  is  one  of  the  A  >  the  server  sends  the  client  a  1  followed  by 
the  description  of  the  string  i  of  length  }C(i).  The  client  responds  to  the  server 
with  a  1  if  it  indeed  has  the  string  j  —  i  —  2 *-1  and  a  0  followed  by  the  actual 
value  of  the  string  otherwise.  When  the  distribution  received  by  the  server  is 
not  one  of  the  A,  the  server  sends  the  client  a  0,  and  this  is  followed  by  any 
other  protocol.  Note  that  in  such  a  protocol,  when  the  server  has  a  distribution 
A,  the  expected  total  number  of  bits  used  is  at  most  /C(^)  -h  3.  ■ 

However,  by  the  assumption  that  /(A)  is  recursive,  we  can  describe  the 
string  F(m)  simply  by  the  value  m.  This  is  sufficient  to  determine  F{m),  since 
we  can  compute  for  each  i,  in  increasing  order,  /(A)  until  we  find  the  first  i 
such  that  /(A)  >  ui.  Thus,  IC{F{m))  <  logm-|-  c',  for  some  constant  F.  Since 
F{m)  is  defined  for  all  natural  numbers  m,  we  have  reached  a  contradiction.  ■ 

Consider  for  example  any  approximation  function  for  OPT{D)  that  is  guar¬ 
anteed  to  return  a  value  g{D)  such  that  a~^{OPT{D))  <  g{D)  <  a{OPT{D))j 
where  a  and  a~^  are  Ackerman’s  function  and  its  inverse,  respectively.  The 
function  f(D)  ~  a~^{g{D))  <  OPT{D),  and  f{D)  grows  without  bound  as 
OPT(D)  grows,  albeit  very  slowly.  Thus,  f{D)  is  not  recursive,  which  in  turn 
implies  that  g{D)  is  also  not  recursive. 
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4  Lower  Bounds  on  Computation  and  Rounds 

We  show  that  protocol  Computation-efficient  is  existentially  within  a  con¬ 
stant  factor  of  the  best  possible  in  terms  of  the  number  of  black  box  queries 
required,  for  any  protocol  that  does  not  require  a  large  number  of  bits  to  be 
sent  by  client. 

Theorem  7  For  any  entropy  H ,  there  is  a  set  of  distributions  Bh,  M  with 
entropy  H ^  such  that  for  any  protocol  P,  when  D  is  chosen  uniformly  at  random 
from  Bh,  the  number  of  bits  sent  by  the  client  plus  the  number  of  black  box 
queries  performed  by  the  server  is  at  least  n. 

Proof:  The  set  Bh  consists  of  2”  distributions:  one  for  each  n  bit  string  Xi. 
In  the  i^^  distribution  Di,  string  Xi  occurs  with  probability  p,  and  all  remaining 
strings  occur  with  probability  where  p  is  chosen  so  that  the  resulting 

entropy  of  Di  is  exactly  H,  For  this  set  of  distributions,  for  any  black  box 
query,  the  response  is  always  one  of  two  results,  both  of  which  are  known  by 
the  server  a  priori,  provided  that  the  server  knows  the  set  of  distributions  being 
used.  Specifically,  if  the  query  specifies  any  k  bits  (leaving  n  ~  k  bits  as  wild 
cards),  then  the  two  possible  answers  are  2^“^  •  (in  the  case  where  the 

single  likely  string  does  not  match  the  query),  and  (2^“^'  -  1)  •  +  p,  (in  the 

case  where  the  single  likely  string  does  match  the  query). 

The  actions  of  the  server  can  be  viewed  as  a  decision  tree,  where  each  node  of 
the  tree  represents  either  a  bit  received  from  the  client  or  a  black  box  query,  and 
each  leaf  represents  an  output  produced  by  the  server.  Each  of  the  2^  possible 
strings  held  by  the  client  must  result  in  reaching  a  different  leaf,  and  thus  the 
average  height  of  the  leaves  must  be  at  least  n.  ■ 

Protocol  Round-efficient  is  trivially  within  a  constant  factor  of  the  best 
possible  in  terms  of  the  expected  number  of  rounds  required.  We  show  here 
that  a  protocol  that  always  completes  in  a  single  round  would  require  either 
the  number  of  bits  sent  by  the  client  to  be  much  larger  than  the  minimum,  or 
the  number  of  bits  sent  by  the  server  to  be  much  larger  than  the  minimum. 
A  single-round  protocol  is  defined  as  a  protocol  where  the  server  sends  some 
number  of  bits  to  the  client,  and  the  client  responds  with  some  number  of  bits 
back  to  the  server,  at  which  time  the  server  knows  the  string  held  by  the  client. 

Theorem  8  Let  H  be  any  entropy  and  let  P  be  any  single-round  protocol  where 
the  expected  number  of  bits  sent  by  the  client  is  at  most  c  •  where  c  '  H  < 
There  is  a  set  of  distributions  B’^,  all  with  entropy  H ,  such  that  when  D  is 
chosen  uniformly  at  random  from  B'jj,  the  expected  number  of  bits  sent  by  the 
server  using  P  is  at  least 

Proof:  By  Theorem  4,  we  can  assume  that  c  >  1.  We  show  that  the  Theorem 
is  true  for  the  following  set  of  distributions  B^.  There  are  distributions. 
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one  for  each  subset  of  2^  of  the  2”  strings.  Call  such  a  subset  a  likely  subset.  For 
each  distribution,  the  chosen  string  is  one  of  the  strings  in  the  likely  subset  with 
probability  1  —  and  one  of  the  other  strings  with  the  remaining  probability. 
The  strings  in  the  likely  subset  each  occur  with  equal  probability,  as  do  the 
strings  not  in  the  likely  subset. 

We  show  that  any  single-round  protocol  P  for  this  set  of  distributions,  where 
the  server  sends  a  small  number  of  bits,  would  imply  a  protocol  for  the  following 
problem  that  violates  an  easy  lower  bound  for  that  problem. 

Definition  3  The  subset  identification  problem:  the  server  has  an  M-bit  binary 
string  I,  containing  exactly  m  1  where  M  and  m  are  known  in  advance  by 
both  the  server  and  the  client.  The  server  is  allowed  to  send  bits  to  the  client, 
but  no  bits  pass  in  the  other  direction.  The  task  is  to  inform  the  client  of  the 
string  1. 

Note  that  since  there  are  (^)  possible  inputs  to  the  problem,  on  average  the 
server  must  send  log  (^)  bits  to  the  client.  Thus,  the  following  lemma  directly 
implies  the  Theorem. 

Lemma  1  Any  single-round  protocol  A  for  the  set  of  distributions  where 
the  expected  number  of  hits  the  server  sends  to  the  client  is  at  most  x,  and  the 
expected  number  of  bits  the  client  sends  to  the  server  is  at  most  cH ,  implies 
a  protocol  B  for  the  subset  identification  problem,  with  M  ~  2'^  and  m  =  2^ , 
where,  on  average,  the  server  sends  x  -f  |  log  (^)  bits  to  the  client. 

Proof  (of  Lemma):  The  protocol  B  proceeds  as  follows.  The  server  examines 
the  M  =  2”-bit  input  string  I  and  determines  the  m  =  2^  bits  that  are  set  to 
1.  The  server  then  sends  the  bits  to  the  client  that  would  be  sent  to  the  client 
during  protocol  A  with  the  distribution  that  has  the  likely  subset  containing 
the  strings  corresponding  to  the  location  of  the  Vs  in  I. 

The  client  and  the  server  then  separately  determine  the  same  2”-bit  string 
r ,  that  is  an  approximation  to  the  string  1.  P  is  determined  as  follows:  for 
each  of  the  2^  n-bit  strings  Xi^  if  in  protocol  A,  the  client  responds  with  at  most 
AcH  bits,  then  the  ith  bit  in  I'  is  set  to  a  1,  and  otherwise  it  is  set  to  a  0.  The 
server  then  sends  the  client  enough  information  to  correct  P  to  /,  which  can  be 
done  efficiently  because  of  the  following: 

Claim  2  The  number  of  Vs  in  P  is  at  most  2^"^^ .  Furthermore,  the  number  of 
bits  that  are  Vs  in  I  that  are  not  Vs  in  P  is  at  most 

Proof  (of  Claim):  Since  the  server  always  knows  what  string  the  client  has 
at  the  end  of  protocol  A,  and  it  is  possible  for  the  client  to  have  every  string,  the 
client  must  send  a  different  set  of  bits  on  each  string.  Thus,  the  number  of  strings 
where  the  client  sends  at  most  4cH  bits  is  at  most  2^^^.  Since  the  expected 
number  of  bits  that  the  client  sends  is  at  most  cH ,  the  expected  number  of  bits 
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sent  by  the  client,  given  that  the  string  is  a  likely  string,  is  at  most 
Thus,  by  Markov’s  inequality,  the  fraction  of  likely  strings  where  the  client  sends 
more  than  AcH  bits  is  <  |.  ■ 

The  server  only  needs  to  send  the  client  the  location  of  the  Ts  in  /  that  are 
O’s  in  T,  and  the  location  of  the  I’s  in  /  within  the  I’s  in  /^  The  former  requires 
at  most  ^log2’^  bits,  and  the  latter  requires  at  most  2^1og2'^^^  bits.  Using 
the  fact  that  AcH  <  this  is  <  |  log  (^)  bits,  and  the  lemma  follows.  ■  ■ 
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